首页> 外文OA文献 >Identifying Mislabeled Training Data
【2h】

Identifying Mislabeled Training Data

机译:识别错误标记的培训数据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper presents a new approach to identifying and eliminating mislabeledtraining instances for supervised learning. The goal of this approach is toimprove classification accuracies produced by learning algorithms by improvingthe quality of the training data. Our approach uses a set of learningalgorithms to create classifiers that serve as noise filters for the trainingdata. We evaluate single algorithm, majority vote and consensus filters on fivedatasets that are prone to labeling errors. Our experiments illustrate thatfiltering significantly improves classification accuracy for noise levels up to30 percent. An analytical and empirical evaluation of the precision of ourapproach shows that consensus filters are conservative at throwing away gooddata at the expense of retaining bad data and that majority filters are betterat detecting bad data at the expense of throwing away good data. This suggeststhat for situations in which there is a paucity of data, consensus filters arepreferable, whereas majority vote filters are preferable for situations with anabundance of data.
机译:本文提出了一种新方法,用于识别和消除针对监督学习的标签错误的训练实例。这种方法的目的是通过提高训练数据的质量来提高学习算法产生的分类准确性。我们的方法使用一组学习算法来创建分类器,以用作训练数据的噪声过滤器。我们对容易出现标签错误的五个数据集评估单一算法,多数表决和共识过滤器。我们的实验表明,对于高达30%的噪声水平,滤波可以显着提高分类精度。对我们方法精度的分析和实证评估表明,共识过滤器在丢弃好数据方面是保守的,以保留坏数据为代价,多数过滤器在检测坏数据上的优势是以丢弃好数据为代价。这表明对于缺乏数据的情况,首选共识过滤器,而对于多数数据的情况,多数表决过滤器是首选。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号